Working with objects

ANU BDSI
workshop
Introduction to R programming

Emi Tanaka

Biological Data Science Institute

3rd April 2024

Current learning objective

  • -Conduct elementary arithmetic operations using R
  • -Navigate the RStudio interactive development environment (IDE)
  • -Install external packages in R to extend functionality
  • Comprehend various object types in R
  • Manipulate lists, matrices, and vectors in R
  • Compute basic summary statistics including mean, median, quartiles, and standard deviation using R
  • Grasp the concept of missing values within the R environment
  • -Import and export data in R
  • -Create basic functions, employ conditional statements, and utilize for loops in R
  • -Decipher error messages and do basic troubleshooting

Assignment to object

  • You can assign values to objects using <- or = or even ->
  • Just be consistent which one you use!
  • The name of the object can be variable so long as it is syntactically valid (no spaces and most special characters, and the name cannot start with a digit)

Vectors

  • We can combine scalars to form vectors using c():
  • This is a vector of length 3
  • This vector is stored as a double with the class as numeric

Vector types

  • There are four primary types of atomic vectors: logical, integer, double and character.
  • The integer and double vectors are collectively called numeric vectors.

Vector

  • A vector can only consist of the same type
  • If you attempt to combine mismatched types together, it will try to coerce all values to the same type.

Casting to other types

  • as.numeric() tries to coerce input to numeric value.
  • If a logical value is coerced to numeric or integer, then
    • TRUE is 1 and
    • FALSE is 0.

Factor

  • A factor in R is a special type of integer vector used typically to encode categorical variables.

Lists

  • Lists allow to combine elements of different types.
  • You can use str() to see the internal structure of an object in R.

Data frames

  • data.frame is a special type of a named list where each element of the vector is the same length.

Subsetting vectors Part 1

  • Positive integers select elements at the specified positions:
  • Negative integers exclude elements at the specified positions:

Subsetting vectors Part 2

  • Logical vectors select elements where logical value is TRUE.
  • If the logical vector used for subsetting a vector is shorter than it then the logical vector is recycled to match the length of the vector.

Subsetting named vectors

  • Character vectors select elements based on the name of the vector (if any):

Subsetting lists

Subsetting named lists

Subsetting data frames

Numerical summaries

  • Numerical summaries generally come base or stats package.
  • Some common numerical summaries include:
    • Mean: mean()
    • Median: median()
    • Five number summary: fivenum()
    • Minimum: min()
    • Maximum: max()
    • Quantile: quantile()
    • Correlation coefficient: cor()

Missing values

  • NA in R denotes missing values – there are in fact different types of missing values (NA_character_, NA_integer_, NA_real_, NA_complex_).
  • When there are missing values, it can cause issues in the computation.
  • Below we remove the missing values:

Summary